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Abstract 

A generalized-ensemble technique, multicanonical sampling, is used to study the folding of a 34- 
residue human parathyroid hormone fragment. An all-atom model of the peptide is employed and 
the protein-solvent interactions are approximated by an implicit solvent. Our results demonstrate 
that generalized-ensemble simulations are well suited to sample low-energy structures of such large 
polypeptides. Configurations with a root-mean-square deviation (rmsd) to the crystal structure of 
less than one A are found. Finally, we discuss limitations of our implicit solvent model. 
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I. INTRODUCTION 



The successful deciphering of the human genome has aggravated an old challenge in pro- 
tein science: for most of the resolved protein sequences we do not know the corresponding 
structures and functions. Computer experiments offer one way to evaluate the sequence- 
structure relationship but are extremely difficult for realistic protein models where interac- 
tions among all atoms are taken into account. The complex form of the intramolecular forces 
and of the interaction with the solvent, containing both repulsive and attractive terms, leads 
to a very rough energy landscape with a huge number of local minima. These minima are 
separated by energy barriers that are much higher than the typical thermal energy of a pro- 
tein (of order ksT) at room temperature. Hence, simple canonical Monte Carlo or molecular 
dynamics simulations will get trapped in a local minimum and often not thermalize within 
a finite amount of available CPU time. While this multiple minima problem does not nec- 
essarly inhibits molecular dynamics or Monte Carlo studies of peptides and proteins^, it 
restricts calculation of accurate thermodynamic averages to small peptides 3 . 

A number of novel simulation techniques have been proposed for overcoming this multiple- 
minima problem (for a review, see Ref. U). Important recent examples can be found in 

ering, also known as replica exchange method 
and introduced to protein science in Ref. that has become increasingly popular over the 
last few year a 8 i 9 i 10 . Parallel tempering is only one example of a class of new and sophisticated 
algorithms commonly summarized as generalized- ensemble methods^. In this article, we are 
concerned with another generalized-ensemble technique, multicanonical sampling 12 , that was 
first applied to the protein-folding problem in Ref. [13j Its usefullnes for calculating reliable 
thermodynamic quantities at low temperatures has been established for small peptides of up 
to ~ 20 residue o 13 i 14 i 15 . However, stable domains in proteins consists usually of 40-200 amino 
acids. Hence, it is necessary to evaluate the effectiveness of this approach for larger molecules 
than the peptides investigated so far. For this purpose, we have performed generalized- 
ensemble simulations of the peptide fragment PTH(l-34) corresponding to residues 1-34 of 
human parathyroid hormon o 16 i 17 i 18 . 

The 84-amino acid human parathyroid hormone is involved in the regulation of the cal- 
cium level in blood and influences bone formation^. The NH 2 -terminal 34 residues of the 
hormone, further on refered to as PTH(l-34), are sufficient for the biological activities of this 
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hormone suggesting medical and pharmaceutical applications of the peptide 2 ^. The crystal 



lix (PDB code 1ET1) 17 . NMR studies of the peptide in solution under near physiological 
conditions (PDB code lZWA)^ and in 20 % trifiuorethanol solution (PDB code lHPY)^ 
rather indicate an ensemble of structures that have in common two helices separated by a 
disordered region. 

In the present article, we try to overcome the problems of previous simulated annealing 
simulations of PTH(1-34) 21 », that did not allow a detailed structure evaluation, by using 
multicanonical sampling^, one of the most prominent generalized-ensemble techniques. An 
all-atom representation of the molecule is employed and the intramolecular interactions are 
described by the ECEPP/3 force field 22 . The protein-solvent interactions are approximated 
by the solvent accessible surface term of Ooi et alM Quantities such as the average helicity, 
number of contacts, average energy, and specific heat are calculated. Our results demonstrate 
the feasibility of generalized-ensemble simulations for large molecules such such as PTH(1- 
34). In addition, they indicate that with the advent of these and other modern search 
techniques, structure prediction of proteins is limited more by current energy functions (and 
especially solvent approximations) than by the simulation algorithms. 

II. METHODS 

Our research into the thermodynamics of PTH(l-34) is based on a detailed, all-atom 
representation of that peptide. The interactions between the atoms are described by a 
standard force field, ECEPP/3 2 ^ (as implemented in the program package SMMP 24 -), and 
are given by the sum of the electrostatic term Eq, the Lennard- Jones energy Elj, hydrogen- 
bond term E^b for all pairs of atoms in the peptide together with the torsion term E tor for 
all torsion angles: 



structure of the peptide has been resolved at 0.9 A and resembles a slightly bend long he- 



Eecepp/2 — Ec + El j + Ejjb + E tor , 




(1) 
(2) 



(3) 
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Etor = £tf,(l±cos(n,x,))- (5) 
I 

Here, (in A) is the distance between the atoms i and j, xi is the torsion angle for the 
chemical bond I and n\ characterizes its symmetry. The charges and force field parameters 
q iy Aij, Bij,Cij, Dij,U[ were calculated from crystal structures of amino acids using semi- 
empirical methods. The dielectricity constant is set to e — 2, its common value in ECEPP 
calculations. Since the bond lengths are fixed in ECEPP, the backbone torsion angles <f),ip,uj 
and the side chain torsion angles \ are the true degrees of freedom. Hence, in a Monte 
Carlo (MC) sweep single angles are updated sequentially by the Metropolis algorithm. The 
protein-water interactions are approximated by a solvent-accessible surface term 

E S olv = ^ ° iAi ' ( 6 ) 

i 

where Ai is the solvent accessible surface area of the i—th atom in the present configuration, 
and <7j the solvation parameter for the atom i. We choose the solvation parameter set OONS 
of Ref.— that is often used together with the ECEPP force field. The potential energy of 
the solvated molecule is then given by 

Etot = Eecepp/3 + E so i v (7) 

In such a detailed protein model, the various competing interactions lead model to an en- 
ergy landscape with a multitude of local minima separated by high-energy barriers. Canon- 
ical Monte Carlo or molecular dynamics simulations will likely get trapped in one of these 
minima and not thermalize within the available CPU time. Only recently, with the intro- 
duction of new and sophisticated algorithms such as generalized-ensemble techniques^ was 
it possible to alleviate this problem in protein simulations^. For simulating PTH(l-34) 
we have chosen one most commonly used generalized-ensemble technique, multicanonical 
sampling^. 

The multicanonical algorithm 12 assigns a weight w mu (E) oc l/n(E) to conformations 
with energy E. Here, n(E) is the density of states. A simulation with this weight leads to 
a uniform distribution of energy: 

P m u(E) oc n(E) w mu (E) = const . (8) 



Thus, the simulation generates a ID random walk in the energy space, allowing itself to 
escape from any local minimum. Since a large range of energies is sampled, re- weighting^ 
allows one to calculate thermodynamic quantities over a wide range of temperatures T by 

Jdx A{x) w-\E{x)) e-^> 
T ' $dxw~\E{x)) e-^) ' { ) 

where x stands for configurations, E(x) for its total potential energy E(x) = Eecepp/z{ x ) + 
E so iv{x) and j3 for the inverse temperature, (3 = l/ksT. 

Unlike in constant temperature simulations the weights are not a priori known in mul- 
ticanonical simulations. In fact, knowledge of the exact weights is equivalent to obtaining 
the density of states n(E), i.e., solving the system. However, for a numerical simulations 
estimators are sufficient as long as these do not deviate not too much from This is 

because the same weights that are used for the simulation appear also in the re-weighting 
procedure of Eq. El In the present study, we calculate these estimators from a preliminary 



26. All 



simulated annealing run of 80,000 MC-sweeps using the method described in Ref. 
thermodynamic quantities are then estimated from one production run of 1, 000, 000 sweeps, 
starting from a random initial configuration and after discarding 10, 000 sweeps for thermal- 
ization. We store in every fifth sweep for further analysis various physical quantities and 
the dihedral angles of the current configuration. Our error bars are estimated by dividing 
this time series of data into 8 bins of each 125, 000 sweeps. 



III. RESULTS AND DISCUSSION 



We start our analysis by calculating thermodynamic averages of the intramolecular en- 
ergy < E EC EPP/3 > (T) and the solvation energy < E SO lv > (T). Both quantities and 
the resulting total energy < Etot > (T) =< Eecepp/z + Esolv > (T) are displayed as 
function of temperature in Fig. 1. The thermal behavior of the peptide is characterized by 
a competition between intramolecular and solvation energy. While < E EC epp/3 > (T) de- 
creases with decreasing temperature, < E SOLV > (T) increases toward lower temperatures. 
The interplay of both terms leads to two temperature regimes that are separated by a steep 
decrease in the total energy < Etot > (T). The corresponding pronounced peak in the 
specific heat per residue 

C(T) = (3 2 (< E TOT > (T)— < E TOT > 2 (T)) /34, (10) 
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displayed in the inlet, marks the transition temperature at T c = 560 ± 10 K. 

The structural changes associated with this transition can be deduced from Fig. 2 where 
we display the average helicity < n# > (T) as a function of temperature. Here, we have 
defined as the number of residues whose pair of backbone dihedral angles (0, ip) takes 
values in the range: (—70° ± 30°, —37° ± 30°). We see from the plot of both quantities that 
the high-temperature (high-energy) region is characterized by configurations with vanishing 
helicity (~ 10%) while at low temperatures (and, correspondingly, low energies) configu- 
rations dominate that are almost completely helical (~ 90% of the residues are part of a 
helix). The pronounced peak at T c in the susceptibility (per residue) 

X (T) = (<n 2 H >-<n H > 2 ) /34, (11) 

shown in the inlet, is further proof for the sharp transition between low-energy helical states 
and high-energy disordered coil states. Associated with this helix-coil transition is also a 
decrease in the solvent accessible volume < V > (T) as calculated by the double cubic 
lattice method^! (data not shown). Above T c , the average volume is < V >~ 10000 A 3 , 
while below T c the volume is reduced to < V >~ 9000 A 3 . 

The modest decrease in < V > together with the large value of the helicity < nn > (T) 
indicate that below T c a single elongated helix is the dominant structure for PTH(l-34). 
Indeed, we find in our simulation as lowest-energy state an elongated helix with 31 residues 
part of the helix. This structure, shown in Fig. 3b, has not only the lowest total energy 
(Etot — —277.8 kcal/mol), but also the lowest intramolecular energy: E E cepp/3 = —136.5 
kcal/mol. It is very similar to the crystal structure of PTH(l-34) (PDB code 1ET1, displayed 
in Fig. 3a) where also 31 residues are part of an a-helix and whose energy after regularization 
with the program FANTOM 28 is E TOT = -277.9 kcal/mol (E ECE pp/3 = -187.0 kcal/mol). 
While our numerically determined structure has a slightly larger solvent-accessible surface 
area (A = 3860 A 2 ) than the crystal structure (A = 3410 A 2 ), it has 95% of all native 
contacts formed, i.e. 95 % of the contacts between residues found in the crystal structure 
exist also in our lowest-energy configuration. Here, we consider two residues in contact 
if the distance between their C a -atoms is less than 8 A, and the two residues are neither 
neighbor nor next-nearest neighbor in the peptide chain. Given that almost all native 
contacts are formed in the lowest-energy structure, it is not surprising that the root-mean- 
square deviation (rmsd) between the two structures is only 0.8 A for backbone atoms (2.3 
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A when all heavy atoms are taken into account). For comparison, the crystal structure of 
PTH(l-34) (1ET1) itself was solved at 0.9 A resolution 17 . We remark that recent structure 
determinations of the similar sized villin headpiece subdomain HP-36, a 36-residue peptide, 
by Energy Landscape Paving^ and parallel tempering^ were restricted to an accuracy of 
w 6 A. We believe that the higher accuracy of our PTH(l-34) results does not indicate 
any advantage of multicanonical sampling over the above methods but is rather due to the 
simpler geometry of PTH(l-34). 

At T = 300 K, our lowest energy structure appears with a frequency of (99 ± 0.5)%, 
i.e. almost all observed configurations resemble the crystal structure. The predominance of 
this structure is also reflected in the well-developed funnel in Fig. 4 where we display the 
projection of the free-energy landscape at T = 300 K on the number of native contacts. 
The free energy decreases rapidly with increasing number of native contacts. No indications 
for competing local minima that could act as traps are observed indicating a rather smooth 
funnel. On the other hand, at the transition temperature T c = 560 K, the free energy 
landscape (displayed in the inlet) is flat and configurations with small number of native 
contacts coexist with such that have many native contacts. 

However, while the crystal structure is in our simulation the dominant configuration 
at T = 300 K, it differs from the set of NMR-structures found at room temperature. In 
near-physiological solution, one observes instead two helices separated by a disordered and 
flexible region. We show in Fig. 3c as an example one of the resolved solution configura- 
tions (from lHPY)i&. The N-terminal helix ranges from GIU4 to Hisg and the C-terminal 
helix from Ser^ to Gln 2 g. Addition of trifluorethanol reduces hydrophobic interactions and 
increases the length of these helices^. Hence, our simulation of PTH(l-34) does not repro- 
duce the experimental results for that peptide in solution albeit protein-solvent interactions 
are considered in our energy function by an approximate term. Instead, our simulation 
favors the crystal structure of the peptide that is observed in membrane and hydrophobic 
environments. 

In order to understand in greater detail the relation between our simulation results and 
the NMR experiments, we plot in Fig. 5a for each residue the free energy difference AGj 
at T = 300 k between configurations with residue i part of an a-helix and such where that 
residue is not part of an a-helix. The free-energy differences are largest for residues Asni6 
-Lys27, an d it is for these residues that first helix formation is observed. A second cluster of 
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residues that have large free-energy differences are observed between Ile 5 and Asni . Both 
regions are separated by residues Leun - Leuis that have smaller free energy differences. The 
observed free-energy differences are strongly correlated with differences in the (potential) 
energy AE TOT that together with its two components E EC epp/3 and Esolv are plotted 
in Fig. 5b. Note, that the variations in the energy differences result from the Eecepp/z 
part, i.e. from the intramolecular interactions. The corresponding solvent energies favor in 
general residues that are not in a helical state but vary little with the residues. In addition, 
their magnitude is so small that it is difficult to distinguish in the figure between AE T ot 
and AEecepp/3- 

The position of the two helices in the solvent structure of Fig. 3c corresponds to the 
regions where in our simulation the measured free-energy differences and potential energy 
differences are large. In the NMR structures, the C-terminal helix is more stable than 
the N-terminal helix. Similarly, we find larger absolute values of AG; — 14 kcal/mol 
(AEtot ~ —39 kcal/mol) for residues Asni 6 to Lys 2 7 (with the maximal values at Arg 20 : 
AG = —19.4 kcal/mol and AEtot = —55.1 kcal/mol) compared with AGi ~ —12 kcal/mol 
(AEtot ~ — 36 kcal/mol) for residues lies to Asnio- On the other hand, the flexible region 
connecting the two helices in the NMR structure corresponds to a region of residues that 
have with AGi ~ —8 kcal/mol and AEtot ~ —25 kcal/mol considerably smaller free 
(potential) energy differences. The free-energy differences are smallest for Leun and Glyi 2 : 
AG^ ~ —6 kcal/mol and and AE T ot ~ 17 kcal/mol. The later result is not surprising giving 
the inherent flexibility of glycine (which, however, is part of the helix in our lowest-energy 
configuration). 

The observed variations in the free and potential energy differences suggest that we may 
find at higher temperatures configurations similar to the NMR structures. This is because 
the helix will be de-stabilized with increasing temperature, and more easily for residues 
Leun to Leuis than in the regions that corresponds to the two terminal helices. In order 
to test this conjecture, we show in Fig. 6 two quantities as function of temperature. One 
is the frequency of configurations that have a continuous helix stretching at least between 
lies and Lys 2 7 and are therefore similar to the crystal structure of PTH(l-34). The second 
quantity is the frequency of configurations that have helices stretching at least between lies 
and His 9 and between Ser 17 and Gln 29 , but are separated in-between by a non- helical flexible 
region. Hence, the later quantity measures the frequency of configurations that are similar 
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to the NMR structure. A typical example of this group of configurations is displayed in 
Fig. 3d. While this conformation is also a local minimum, its total energy Exot — — 201.4 
kcal/mol and intramolecular energy Eecepp/3 — —48.6 kcal/mol are much higher than the 
corresponding values for the lowest-energy configuration of Fig. 3b: Etot — —277.8 kcal/mol 
and E E cepp/3 = —135.5 kcal/mol. On the other hand, its solvation energy Esolv — —153.0 
kcal/mol is lower than that of Fig. 3b (Esolv — —141.3 kcal/mol) but the differences are 
smaller than the one in the ECEPP/3 term. 

Both kind of configurations appear at the helix-coil transition temperature T c = 560(10) 
K. Due to their higher entropy, configurations that resemble the NMR structures are slightly 
more common for temperatures in the range 520 K < T < 560 K than the ones that 
are similar to the crystal structure leading to a positive free energy difference AG that is 
displayed in the inlet of Fig. 6. At T = 520 K, 43(4)% of all configurations are similar to 
the NMR structures and 40(5)% resemble more the crystal structure. Below T = 520 K, the 
intramolecular energy that favors an extended single helix wins over the higher entropy of 
states with two separate helices. The resulting negative free energy difference AG leads to 
a decrease in the frequency of NMR-like structures, and their contribution is less than 1% 
at room temperature. 

We conjecture that the above relations hold also in nature. The elongated helix of the 
crystal structure is favored by the intramolecular energies and is the ground state in potential 
energy. Thermal fluctuations lead to the more flexible configurations with two helices that 
are observed for the soluted peptide. In a (hydrophobic) membrane environment or when 
binding to a receptor reduces the entropy of the molecule, PTH(l-34) stays in the 1-helix 
state that also seems to lead to increase its biological activity^. 

However, the energy function in our simulation overstabilizes the extended a-helix of 
the PTH(l-34) ground state (the crystal structure). Hence, structures that are found in 
solution with NMR experiments appear in our simulation with significant frequency only 
at temperatures more than 200 K above room temperature. Similarly, we find with T c = 
560(10) K a helix-coil transition temperature that is much higher than physiological relevant 
temperature range. These results clearly point out the limitations of our model. Since 
our data are consistent with what one would expect in a hydrophobic environment one 
can conjecture that these limitations of the energy function are mainly due to our solvent 
approximation. The small variation in the magnitude of the solvent energy in Figs. 1 and 
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5 (when compared with the ECEPP/3 term) suggests that the OONS term underestimates 
the protein-solvent interaction. This maybe due to the fact that our implicit solvent model 
does not account for the hydrogen bonds between the polypeptide and water molecules that 
in water compete with the characteristic intramolecular hydrogen bonding in an a-helix. 
As a consequence, a-helices are overstabilized, and our energy function rather models the 
peptide in a hydrophobic environment than in physiological solution. Another problem is 
that the solvation energy term of Eq. El describes actually a free energy and therefore should 
change with temperature. This effect is neglected in the OONS approximation. Taking 
such a temperature dependence into account and use of more sophisticated implicit solvent 
models would likely improve our results. However, we can also not exclude the possibility 
that the deviation from the NMR results is not duee to the solvent term bit that our force 
field, the ECEPP/3 term that describes the intramolecular interactions, biases toward helical 
conformations. 

It follows that other potential energy functions and implicit solvent models have to be 
chosen for a simulation of PTH(l-34) in solution. However, the failure of our energy function 
to model correctly the solvated molecule demonstrates also the advantages of our approach. 
Unlike the earlier simulated annealing simulations of Ref. |2jJ multicanonical sampling allows 
one to calculate accurate estimates of the frequency of states and other physical quantities 
at room temperature. In that way, we have been able to unveil in the present paper the 
limitations of our energy function. At the same time, our results allow us to understand 
the behavior of our peptide in a hydrophobic environment (since this is what our energy 
function models) and to reproduce the crystal structure with high accuracy. Hence, we 
have shown that multicanonical simulations are not only well-suited for simulations of small 
peptides but also for numerically more challenging molecules such as the 34-residue peptide 
PTH(l-34). 



IV. CONCLUSION 



In summary, we have performed all-atom simulations of PTH(l-34), the biologically active 
peptide- fragment 1-34 of the human parathyroid hormone. Protein-water interactions are 
approximated by a solvent-accessible surface term using the OONS parameter se^M. Our 
results rely on multicanonical sampling and demonstrate that this technique allows one 
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to overcome the multiple minima problem in simulations of this molecule. Unlike earlier 
numerical studies 21 that relied on simulated annealing we find a lowest-energy configurations 
that has a rmsd of only 0.8 A to the crystal structure. While previous applications of 
multicanonical sampling were restricted to small peptides with less than 20 residues, our 
results (albeit for a molecule with a rather simple structure) establish that the generalized- 
ensemble approach is also a useful tool for investigation of much larger polypeptides with 
30 — 40 residues. Applications to even larger molecules seem to be restricted less by the 
sampling technique but by the accuracy of the energy function. 
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Figure Captions: 



Fig. 1 Average intramolecular energy < E ECEPP / 3 >, solvent energy < E S olv > and total 
energy < E TOT >=< E ECE pp/ 3 + E SOLV > of PTH(l-34) as a function of temperature 
T. The inlet displays the specific heat G(T) as a function of temperature T. The data 
are calculated from a multicanonical simulation of 1,000,000 sweeps using a solvent 
accessible surface term to approximate protein-water interactions. 

Fig. 2 Average helicity < hh > as a function of temperature T. The inlet displays the 
susceptibility xCO as a function of temperature T. All data points are calculated from 
a multicanonical simulation of 1,000,000 sweeps using a solvent accessible surface term 
to approximate protein-water interactions. 

Fig. 3 (a) The crystal structure of PTH(l-34) (PDB code 1ET1); (b) The lowest-energy 
conformation of PTH(l-34) as determined from a multicanonical simulation with a 
solvent accessible surface term; (c) Solution structure of PTH(l-34) as determined 
by NMR (PDB-code 1HPY). (d) One of the configurations resembling the solution 
structure. Such configurations appear in our simulation with significant frequency only 
for temperatures closely to but below the helix-coil transition temperature T c = 560 
K. 

Fig. 4 Projection of the free-energy landscape (AG) of PTH(l-34) at T = 300 K on the 
number of native contacts n^c- The same quantity is plotted for the T = 560 K (the 
critical temperature) in the inlet. 

Fig. 5 (a) Free energy difference AGj between configurations with residue i part of an a- 
helix and such where residue i is not in a helical state, (b) The corresponding total (po- 
tential) energy differences AEtot (D), intramolecular energy differences AEecepp/z 
(o) and solvent energy differences AEsolv{x). 

Fig. 6 Frequency of PTH(l-34) configurations (1H) that resemble the crystal structure 
and frequency of states (2H) that are similar to the solution structures of the peptide. 
The free-energy difference AG between both sets of configurations is displayed in the 
inlet. All quantities are shown as function of temperature T. All our data points 
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are calculated from a multicanonical simulation of 1,000,000 sweeps using a solvent 
accessible surface term to approximate protein-water interactions. 
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